Model Selection

Zero-shot learning

# Zero-shot learning

The Teacher V 2

A zero-shot classification model based on the Transformers architecture that can classify text without fine-tuning

Large Language Model

This is an automatically generated transformers model card, and specific information is to be supplemented.

Large Language Model

Sarvam Finetune

This is a transformers model published on Hub. The specific functions and detailed information are to be supplemented.

Large Language Model

Um P2 Fine Tuned Llama Full 2

This is a transformers model that has been pushed to the Hub. The specific functions and uses are to be supplemented.

Large Language Model

YingLong is a pretrained model for time series forecasting, which has been pretrained on 78 billion time points and provides strong support for time series forecasting tasks.

GPT-2 is an open-source language model developed by OpenAI, based on the Transformer architecture, capable of generating coherent text.

Large Language Model English

Xlm Roberta Large Pooled Cap Media Minor

A multilingual text classification model fine-tuned based on xlm-roberta-large, supporting English and Danish, focusing on political agenda and media content classification tasks.

Text Classification

CLIP ViT L Rho50 K1 Constrained FARE2

A feature extraction model fine-tuned based on openai/clip-vit-large-patch14, optimizing the image and text encoders

Multimodal Fusion

Style 250412.vit Base Patch16 Siglip 384.v2 Webli

A vision model based on the Vision Transformer architecture, trained using SigLIP (Sigmoid Loss for Language-Image Pretraining), suitable for image understanding tasks.

Image Classification

Llama 3.1 8B AthenaSky MegaMix

An 8B-parameter large language model fused via MergeKit from multiple high-quality models, optimized for reasoning, dialogue, and creative generation

Large Language Model

Transformers English

Ibm Granite.granite Vision 3.2 2b GGUF

Granite Vision 3.2 2B is a vision-language model developed by IBM, focusing on image-to-text tasks.

YOLOE is an efficient, unified, and open model for object detection and segmentation, supporting various prompting mechanisms including text, visual inputs, and prompt-free paradigms, achieving real-time universal visual perception.

Object Detection

Bytedance Research.ui TARS 72B SFT GGUF

A 72B-parameter multimodal foundation model released by ByteDance Research, specializing in image-text-to-text tasks

Whisper Large V3 Turbo

Whisper large-v3-turbo is an automatic speech recognition and speech translation model proposed by OpenAI, trained with large-scale weak supervision and supporting multiple languages.

Speech Recognition

Transformers Supports Multiple Languages

Lamarckvergence 14B

Lamarckvergence-14B is a pre-trained language model merged via mergekit, combining Lamarck-14B-v0.7 and Qwenvergence-14B-v12-Prose-DS. It ranks first among models with fewer than 15B parameters on the Open LLM Leaderboard.

Large Language Model

Transformers English

DepthMaster is a refined single-step diffusion model that customizes generative features from diffusion models for discriminative depth estimation tasks.

3D Vision English

Minimax Text 01

This model is a text generation model capable of producing coherent text content based on input prompts.

Text Generation

Resnet101 Clip Gap.openai

ResNet101 image encoder based on CLIP framework, extracting image features through Global Average Pooling (GAP)

Image Classification

Ioskef 23 11 06

This is a model checkpoint provided for the Any-to-Any subnet collaboration between OMEGA Labs and Bittensor, aiming to achieve general artificial intelligence tasks.

Large Language Model Other

Ioskef 23 11 05

The Any-to-Any Subnet Model jointly developed by OMEGA Labs and Bittensor, focusing on general artificial intelligence tasks.

Large Language Model Other

Florence 2 Large Ft Safetensors

Florence-2 is an advanced visual foundation model developed by Microsoft, employing a prompt-based architecture to unify various vision and vision-language tasks

Ttm Research R2

A compact pre-trained model for multivariate time series forecasting open-sourced by IBM Research, with parameter scales starting from 1 million, pioneering the concept of 'tiny' pre-trained time series forecasting models.

Show O W Clip Vit

Show-o is a PyTorch-based any-to-any conversion model focused on multimodal task processing.

Show-o is an any-to-any conversion model based on PyTorch, supporting input and output conversion across multiple modalities.

Speecht5 Base Cs Tts

This is a monolingual Czech SpeechT5 base model, pre-trained on 120,000 hours of Czech audio and a 17.5 billion-word text corpus, designed as a starting point for Czech TTS fine-tuning.

Speech Synthesis

Transformers Other

Opensearch Neural Sparse Encoding Doc V2 Distill

A sparse retrieval model based on distillation technology, optimized for OpenSearch, supporting inference-free document encoding with improved search relevance and efficiency over V1

Transformers English

opensearch-project

Controlnet Union Sdxl 1.0

All-round image generation and editing control network, supporting 12 control conditions and 5 advanced editing functions

Image Generation

Bitnet B1 58 Xl Q8 0 Gguf

BitNet b1.58 is a large language model with 1.58-bit quantization. It reduces the computational resource requirements by lowering the weight precision while maintaining performance close to that of a full-precision model.

Large Language Model

Florence 2 Large Ft

Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle various vision and vision-language tasks.

Depth Anything V2 Base Hf

Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on 595,000 synthetically annotated images and over 62 million real unlabeled images, offering finer details and stronger robustness.

Badger Lambda Llama 3 8b

Badger is a Llama3 8B instruction model generated through recursive maximum pairwise disjoint normalized denoising Fourier interpolation method, incorporating features from multiple high-quality models.

Large Language Model

Mobileclip S2 Timm

MobileCLIP-S2 is an efficient image-text model that achieves rapid inference through multimodal reinforcement training, delivering outstanding zero-shot performance while maintaining a compact size.

Large Language Model

Compare2Score is a model for image quality assessment that provides a quality score for images through a specific algorithm.

Image Enhancement

This model is released under the MIT license, specific details are currently unknown.

Large Language Model

An Italian large language model optimized based on Meta-Llama-3-8B, supporting English and Italian text generation tasks

Large Language Model

Transformers Supports Multiple Languages

This is an open-source model using the Apache-2.0 license; specific details need to be supplemented

Large Language Model

Strangemerges 53 7B Model Stock

StrangeMerges_53-7B-model_stock is the result of merging multiple 7B-parameter models using LazyMergekit, possessing powerful text generation capabilities.

Large Language Model

Bitnet B1 58 Large

BitNet b1.58 is a 1-bit large language model with 3 billion parameters, trained on the RedPajama dataset for 100 billion tokens.

Large Language Model

Bitnet B1 58 Xl

BitNet b1.58 3B is a 1-bit quantized large language model trained on 100 billion tokens from the RedPajama dataset, significantly reducing computational resource requirements while maintaining performance.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase